-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support to collect metrics using the OTEL library #927
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really nice! Could you add some documentation about how to enable these metrics? I think that we could create a new section like Operate
and include this information about how to enable prometheus/otel and which metrics are available.
do you think that adding an e2e test could be useful? In KEDA core we cover the metrics too using e2e tests, so we could just add an e2e test like them
Thanks! Yeah, sure. Will draft some docs up and get the PR updated.
That is a good shout. Will get a test for both cases added in and update the PR. |
cd788ad
to
2d3f779
Compare
I have added a brief overview of how to configure each exporter in a Operate.md doc as per your suggestion @JorTurFer and added in some e2e tests (using the core Keda set as a reference). Let me know your thoughts :) |
docs/operate.md
Outdated
# Configuring metrics for the KEDA HTTP Add-on interceptor proxy | ||
|
||
### Exportable metrics: | ||
* **Pending request count** - the number of pending requests for a given host. | ||
* **Total request count** - the total number of requests for a given host with method, path and response code attributes. | ||
|
||
There are currently 2 supported methods for exposing metrics from the interceptor proxy service - via a Prometheus compatible metrics endpoint or by pushing metrics to a OTEL HTTP collector. | ||
|
||
### Configuring the Prometheus compatible metrics endpoint | ||
When configured the interceptor proxy can expose metrics on a Prometheus compatible endpoint. This endpoint can be enabled by setting the `KEDA_HTTP_OTEL_PROM_EXPORTER_ENABLED` environment variable to `true` on the interceptor deployment (`true` by default) and by setting `KEDA_HTTP_OTEL_PROM_EXPORTER_PORT` to an unused port for the endpoint to be made avaialble on (`2223` by default). | ||
|
||
### Configuring the OTEL HTTP exporter | ||
When configured the interceptor proxy can export metrics to a OTEL HTTP collector. The OTEL exporter can be enabled by setting the `KEDA_HTTP_OTEL_HTTP_EXPORTER_ENABLED` environment variable to `true` on the interceptor deployment (`false` by default). When enabled the `KEDA_HTTP_OTEL_HTTP_COLLECTOR_ENDPOINT` environment variable must also be configured so the exporter knows what collector to send the metrics to (e.g. opentelemetry-collector.open-telemetry-system:4318). If the collector is exposed on a unsecured endpoint then you can set the `KEDA_HTTP_OTEL_HTTP_COLLECTOR_INSECURE` environment variable to `true` (`false` by default) which will disable client security on the exporter. If you need to provide any headers such as authentication details in order to utilise your OTEL collector you can add them into the `KEDA_HTTP_OTEL_HTTP_HEADERS` environment variable. The frequency at which the metrics are exported can be configured by setting `KEDA_HTTP_OTEL_METRIC_EXPORT_INTERVAL` to the number of seconds you require between each export interval (`30` by default). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL @tomkerkhove
@@ -122,3 +177,35 @@ func TestDeployKEDAHttpAddOn(t *testing.T) { | |||
t.Log(string(out)) | |||
t.Log("KEDA Http Add-on deployed successfully using 'make deploy' command") | |||
} | |||
|
|||
func TestSetupOpentelemetryComponents(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add the removal of the collection too please? (during the teardown)
4a10aa4
to
a8e5a05
Compare
assert.True(t, ok, "interceptor_request_count_total is available") | ||
|
||
requestCount := getMetricsValue(val) | ||
assert.Equal(t, float64(1), requestCount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd assert this using greater or equal because the Prometheus metrics are persisted over the time, and 2nd and 3rd tries can fail because the metric returns 2,3, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that is a good shout - when I had tested I was just tearing it down after every run so only ever had a count of 1 but that won't always be true
- "exec tail -f /dev/null"` | ||
) | ||
|
||
func TestMetricGeneration(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This e2e test has failed due to compiling issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, left a small typo in there. Removed.
assert.True(t, ok, "interceptor_request_count_total is available") | ||
|
||
requestCount := getMetricsValue(val) | ||
assert.Equal(t, float64(1), requestCount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd assert this using greater or equal because the Prometheus metrics are persisted over the time, and 2nd and 3rd tries can fail because the metric returns 2,3, etc
a8e5a05
to
60720b2
Compare
Will take a look why the e2e tests failed when I get a sec |
71f7a82
to
7dd67c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There are some static code errors, could you take a look please? https://github.com/kedacore/http-add-on/actions/runs/8453796744/job/23280002430?pr=927 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs LGTM, nice work!
7dd67c3
to
65135b0
Compare
Thanks for the feedback. Remaining suggestions should be addressed and the static checks fixed. Just noticed I need to fix some conflicts too. Will sort that when I next get a sec 👍 |
65135b0
to
adc1c8a
Compare
I have just rebased onto main and the unit/e2e tests look like they are passing fine so we should be good now 👍 Edit: @JorTurFer it looks like I have a Snyk failure but can't view them via the details link. Is there another way to view the results? |
We can ignore Synk, it has transient failures from time to time xD I have marked it as okey manually :) |
There is other PR ongoing and once that's merged, I'll update your PR and merge it. Thanks for your AWESOME contribution ❤️ |
config: | ||
exporters: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config: | |
exporters: | |
image: | |
repository: "otel/opentelemetry-collector-contrib" | |
config: | |
exporters: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a potential breaking change that requires the change above, please commit it 🙏
https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/UPGRADING.md#0880-to-0890
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done @JorTurFer :)
Also, I've pushed a change to move the pending request count from a Gauge to a UpDownCounter (noticed some memory issues with the Gauge). Left it in a separate commit rather than squashing locally so it's easier for you to review.
Head branch was pushed to by a user without write access
There are merge conflicts to be solved, could you solve them? (or enabling me to update your PR xD) |
…ible endpoint or by sending metrics to an OTEL http endpoint Signed-off-by: Joe Wogan <joe.wogan@10xbanking.com>
specify otel image for e2e tests Signed-off-by: Joe Wogan <joe.wogan@10xbanking.com>
ceb81d5
to
06fb14a
Compare
Should be sorted now @JorTurFer :) Also, allowed maintainers to edit the PR, not sure why I didn't select this previously. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! ❤️
I didn't noticed during the review that we aren't respected the OTEL spec ENV vars: https://opentelemetry.io/docs/specs/otel/protocol/exporter/ I've created an issue to track it, as I guess that you are using the metrics, I share it with you because we will have to change them for future versions: #1031 |
Ah sugar, I completely missed that too 🤦. I'm happy to get a PR raised toward the end of next week to fix this if someone isn't already working on it. |
I'd be nice, I just pinged you to make you aware about a possible breaking change, but if you are willing to update the code, it's awesome :) |
Adding support to the interceptor proxy to allow for the collection of metrics via a prometheus compatible endpoint or via a OTEL HTTP collector. Resolves #910
Checklist
README.md
docs/
directory